library(dplyr)
library(tidyr)
library(ggplot2)
library(stringr)
library(lme4)
library(rstatix)
options(scipen=999)
df = read.csv(here::here('avg_tensor_by_roi_wide.csv'),
colClasses = c('subject' = 'factor',
'site' = 'factor',
'visit' = 'factor')) %>%
rename(scan = visit)
outcomes = df %>%
select(where(is.numeric)) %>%
colnames
The purpose of the DTIMSCAMRAS study is to test for site effects on DTI metrics of different brain regions in the context of a traveling subjects design with a harmonized imaging protocol. Participants (n=11) with stable Multiple Sclerosis (MS) were scanned in 4 different locations: National Institute of Health (NIH), Brigham and Women’s Hospital (BWH), Johns Hopkins University (Hopkins), and The University of Pennsylvania (Penn). Two scans were collected per site visit, and participants were re-positioned between scans. Penn, BWH and NIH used Siemens scanners, while Hopkins used a Phillips scanner. The modalities that were collected include T1s (MPRAGE), FLAIR, and Diffusion-weighted images (DWI).
To preprocess imaging data, qsiprep was used for bias correction, skull-stripping and co-registration of the T1s and DWIs, and specialized denoising and artifact correction for the DWIs. Then, DTIFIT was used to compute the following scalar maps:
Concurrently, several segmentation pipelines were used on the T1s (+ FLAIRs in the case of MIMOSA) with the purpose of defining particular regions of interest (ROIs):
The segmentation results were used to average the scalar maps across all voxels belonging to each of the ROIs. The 36 resulting metrics make up the set of outcome variables which are the focus of the site effects test. Below, the outcome variables are listed.
data.frame(OUTCOME = outcomes) %>%
separate(OUTCOME, c("SEGMENTATION", "ROI", "SCALAR"), remove = FALSE)
Note the general
A permutation-based F-test was used to test for site effects. For each subject, the absolute difference between all possible pairs of intra-site and inter-site measures were computed. For instance, take the first row of the subject data.frame below, which has BWH as the “reference site”: the absolute differences between that row and the 6 non-BWH rows are computed for \(y\)——this process is repeated for all rows (and the results averaged) to arrive to the average inter-site difference for this subject; for the average intra-site differences the absolute difference of the 4 intrasite pairs are averaged.
df %>%
filter(subject == '01001') %>%
select(!where(is.numeric), ATROPOS_GM_AD) %>%
rename(y = ATROPOS_GM_AD)
These differences were averaged across all subjects resulting in the Mean Absolute Difference Between Sites (\(\overline{MAD_{B}}\)) and the Mean Absolute Difference Within Sites (\(\overline{MAD_{W}}\)). The test statistic is composed of their ratio:
\[ F = \frac{\overline{MAD_{B}}}{\overline{MAD_{W}}} \]
For each metric, the observed test statistic was compared to a distribution of test statistics derived from 10000 permutations (i.e. a null distribution). The proportion of null results higher than the observed test statistic served as a preliminary p-value \(p_0\). To prevent p-values of 0 (when the observed statistic was higher than all permuted results), the p-value was shifted using the following formula:
\[ p = \frac{10000*p_0 + 1}{10000 + 1}\]
Source code for this project can be found on GitHub.
Subjects 03001 and 03002 did not complete imaging at all 4 sites. Subject 03001 completed imaging at Penn and BWH only and 03002 completed imaging at BWH, Hopkins, and the NIH only.
Subject 02001 is missing DTI data from their NIH visit.
t_df <- df %>%
count(subject, site, .drop = FALSE) %>%
tidyr::pivot_wider(names_from = site, values_from = n) %>%
mutate(`Row Totals` = BWH + Hopkins + NIH + Penn)
t_df <- t_df %>%
summarize(across(where(is.integer), sum)) %>%
bind_rows(t_df, .) %>%
mutate(across(where(is.factor), as.character))
t_df[is.na(t_df)] <- "Column Totals"
t_df
Atropos segmentation failed for 6 images: 04001NIH01, 01003NIH01, 03002NIH02, 04003NIH01, 04001BWH02, and 03001BWH01.
knitr::include_graphics(here::here(c('misc/atropos_success.png', 'misc/atropos_fail.png')))
Left: Example of successful Atropos segmentation; Right: Example of a failed Atropos segmentation with one ROI missing
The following table excludes the failed atropos segmentations. Note subject 03001 only has 3 images after excluding failed atropos segmentations.
t_df <- df %>%
unite(sub, subject, site, scan, sep = '-') %>%
filter(!sub %in% c('04001-NIH-01', '01003-NIH-01', '03002-NIH-02', '04003-NIH-01', '04001-BWH-02', '03001-BWH-01')) %>%
separate(sub, c('subject', 'site', 'scan')) %>%
mutate_if(is.character, as.factor) %>%
count(subject, site, .drop = FALSE) %>%
tidyr::pivot_wider(names_from = site, values_from = n) %>%
mutate(`Row Totals` = BWH + Hopkins + NIH + Penn)
t_df <- t_df %>%
summarize(across(where(is.integer), sum)) %>%
bind_rows(t_df, .) %>%
mutate(across(where(is.factor), as.character))
t_df[is.na(t_df)] <- "Column Totals"
t_df
The failed atropos segmentations are included in the following visualizations but were excluded for the site effects tests.
Densities are colored by site, while the black line represents the overall density aggregated across sites. Distribution of the different sites tend to cluster together except for JLF Thalamus. Additionally, note the long tails caused by outliers in the ATROPOS distributions.
plot_densities <- function(var){
df %>%
ggplot(aes_string(x=var, color='site')) +
geom_density() +
stat_density(aes_string(x = var), geom = "line", color = "black") +
theme_bw() +
xlab(str_replace_all(var, "_", " "))
}
vars <-df %>%
select(ends_with('FA')) %>%
colnames()
dplots <- lapply(vars, plot_densities) %>%
setNames(vars)
for (name in names(dplots)){
cat("#####", str_replace_all(name, "_", " "), "\n")
print(dplots[[name]])
cat("\n\n")
}
Box plots show difference in medians across sites for most variables.
Additionally, note the severe outliers in the ATROPOS metrics which correspond to the failed segmentations. For FIRST THALAMUS, subject 01002 could be considered an outlier; their FA values are consistent across sites and scans, however, suggesting this variation is meaningful and that this subject should be retained.
plot_avg_tensor_values <- function(tensor){
df %>%
select(!is.numeric, ends_with(tensor)) %>%
pivot_longer(ends_with(tensor), names_to = 'seg', values_to = 'average') %>%
separate(seg, into = c('segmentation', 'roi', 'tensor')) %>%
unite(segmentation, segmentation, roi, sep = " ") %>%
mutate(segmentation = factor(segmentation,
levels = c("ATROPOS GM", "ATROPOS WM", "FAST GM", "FAST WM",
"JLF GM", "JLF WM", "JLF THALAMUS", "FIRST THALAMUS", "MIMOSA LESION"))) %>%
ggplot(aes(x=site, y=average)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(color = subject, shape=scan), alpha=0.5) +
#facet_grid(site~.) +
facet_grid(segmentation ~ .) +
coord_flip() +
ggtitle(sprintf("Mean %s in ROI by site", tensor)) +
theme_bw() +
# theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1)) +
xlab("") +
ylab("") +
theme(strip.text.y = element_text(angle = 0)) +
scale_x_discrete(expand = c(0.15, 0.15))
}
plot_avg_tensor_values('FA')
For Mean Diffusivity, distributions vary more widely across sites. In particular, MIMOSA LESION MD values are quite different for Hopkins data.
vars <-df %>%
select(ends_with('MD')) %>%
colnames()
dplots <- lapply(vars, plot_densities) %>%
setNames(vars)
for (name in names(dplots)){
cat("#####", str_replace_all(name, "_", " "), "\n")
print(dplots[[name]])
cat("\n\n")
}
As before, medians appear to be different across sites.
#
plot_avg_tensor_values('MD')
Radial Diffusivity also shows some variations across sites (perhaps less than MD). JLF GM RD in NIH shows multiple peaks, and JLF WM RD shows skewed distributions for all sites. As before the metric for MIMOSA shows a different distribution for Hopkins data.
vars <-df %>%
select(ends_with('RD')) %>%
colnames()
dplots <- lapply(vars, plot_densities) %>%
setNames(vars)
for (name in names(dplots)){
cat("#####", str_replace_all(name, "_", " "), "\n")
print(dplots[[name]])
cat("\n\n")
}
As before, medians appear to be differet across sites.
#
plot_avg_tensor_values('RD')
For Axial Diffusivity, Hopkins distributions show marked differences across most ROIs compared to other sites. The difference is quite pronounced for MIMOSA.
vars <-df %>%
select(ends_with('AD')) %>%
colnames()
dplots <- lapply(vars, plot_densities) %>%
setNames(vars)
for (name in names(dplots)){
cat("#####", str_replace_all(name, "_", " "), "\n")
print(dplots[[name]])
cat("\n\n")
}
As before, medians show differences, particularly for Hopkins.
#
plot_avg_tensor_values('AD')
# replace ATROPOS measures with NA for select images (segmentation failed)
atropos_cols <- df %>%
select(contains('ATROPOS')) %>%
colnames()
df[(df$subject == '04001' & df$site == 'NIH' & df$scan == '01') |
(df$subject == '01003' & df$site == 'NIH' & df$scan == '01') |
(df$subject == '03002' & df$site == 'NIH' & df$scan == '02') |
(df$subject == '04003' & df$site == 'NIH' & df$scan == '01') |
(df$subject == '04001' & df$site == 'BWH' & df$scan == '02') |
(df$subject == '03001' & df$site == 'BWH' & df$scan == '01'), atropos_cols] <- NA
The permutation-based F-test was carried out as previously described. The observed F-statistic for each of the metrics is plotted below as a black dot, while the distribution of permuted test statistics is shown as a white violin plot. All metrics showed significant site effects.
ratio_stats = readRDS(here::here("results/avg_tensor_by_roi_wide_ratio_stat.rds"))
null_dist = readRDS(here::here("results/avg_tensor_by_roi_wide_null.rds"))
null_dist %>%
pivot_longer(everything()) %>%
ggplot(aes(x=name, y=value)) +
geom_violin(draw_quantiles = c(0.95)) +
geom_point(aes(x=name, y=value),
data = ratio_stats %>%
pivot_longer(everything())) +
coord_flip()
The observed values (“black dots”) are represented below in table form along with their unadjusted and adjusted p-values:
test_ratio_stat = function(ratio.stats, null.dists){
p.vals = c()
for (col in colnames(ratio.stats)){
ratio.stat = ratio.stats[, col]
null.dist = null.dists[, col]
percentile = ecdf(null.dist)
p.vals = c(p.vals, percentile(ratio.stat))
}
names(p.vals) = colnames(ratio.stats)
return(p.vals)
}
p = 1 - test_ratio_stat(ratio_stats, null_dist)
p_df <- data.frame(p_val = (10000*p + 1)/(1+10000))
perm_df <- cbind(
ratio_stats %>%
pivot_longer(everything(), names_to = 'outcome', values_to = 'observed_statistic'),
p_df %>%
mutate(p_val_fdr = p.adjust(p_val, method = 'fdr'))
)
rownames(perm_df) <- NULL
perm_df
ratio_stats = readRDS(here::here("results/avg_tensor_by_roi_wide_no_Hopkins_ratio_stat.rds"))
null_dist = readRDS(here::here("results/avg_tensor_by_roi_wide_no_Hopkins_null.rds"))
null_dist %>%
pivot_longer(everything()) %>%
ggplot(aes(x=name, y=value)) +
geom_violin(draw_quantiles = c(0.95)) +
geom_point(aes(x=name, y=value),
data = ratio_stats %>%
pivot_longer(everything())) +
coord_flip()
The observed values (“black dots”) are represented below in table form along with their unadjusted and adjusted p-values:
p = 1 - test_ratio_stat(ratio_stats, null_dist)
p_df <- data.frame(p_val = (10000*p + 1)/(1+10000))
perm_df <- cbind(
ratio_stats %>%
pivot_longer(everything(), names_to = 'outcome', values_to = 'observed_statistic'),
p_df %>%
mutate(p_val_fdr = p.adjust(p_val, method = 'fdr'))
)
rownames(perm_df) <- NULL
perm_df